AITopics | tail probability

Collaborating Authors

tail probability

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Gaussian Approximation and Concentration of Constant Learning-Rate Stochastic Gradient Descent

Neural Information Processing SystemsJun-18-2026, 19:36:50 GMT

We establish a comprehensive finite-sample and asymptotic theory for stochastic gradient descent (SGD) with constant learning rates. First, we propose a novel linear approximation technique to provide a quenched central limit theorem (CLT) for SGD iterates with refined tail properties, showing that regardless of the chosen initialization, the fluctuations of the algorithm around its target point converge to a multivariate normal distribution. Our conditions are substantially milder than those required in the classical CLTs for SGD, yet offering a stronger convergence result. Furthermore, we derive the first Berry-Esseen bound - the Gaussian approximation error - for the constant learning-rate SGD, which is sharp compared to the decaying learning-rate schemes in the literature. Beyond the moment convergence, we also provide the Nagaev-type inequality for the SGD tail probabilities by adopting the autoregressive approximation techniques, which entails non-asymptotic largedeviation guarantees. These results are verified via numerical simulations, paving the way for theoretically grounded uncertainty quantification, especially with non-asymptotic validity.

artificial intelligence, inequality, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Sharp Concentration Inequalities: Phase Transition and Mixing of Orlicz Tails with Variance

Shen, Yinan, Lv, Jinchi

arXiv.org Machine LearningMar-30-2026

In this work, we investigate how to develop sharp concentration inequalities for sub-Weibull random variables, including sub-Gaussian and sub-exponential distributions. Although the random variables may not be sub-Guassian, the tail probability around the origin behaves as if they were sub-Gaussian, and the tail probability decays align with the Orlicz $Ψ_α$-tail elsewhere. Specifically, for independent and identically distributed (i.i.d.) $\{X_i\}_{i=1}^n$ with finite Orlicz norm $\|X\|_{Ψ_α}$, our theory unveils that there is an interesting phase transition at $α= 2$ in that $\PPł(ł|\sum_{i=1}^n X_i \r| \geq t\r)$ with $t > 0$ is upper bounded by $2\expł(-C\maxł\{\frac{t^2}{n\|X\|_{Ψ_α}^2},\frac{t^α}{ n^{α-1} \|X\|_{Ψ_α}^α}\r\}\r)$ for $α\geq 2$, and by $2\expł(-C\minł\{\frac{t^2}{n\|X\|_{Ψ_α}^2},\frac{t^α}{ n^{α-1} \|X\|_{Ψ_α}^α}\r\}\r)$ for $1\leq α\leq 2$ with some positive constant $C$. In many scenarios, it is often necessary to distinguish the standard deviation from the Orlicz norm when the latter can exceed the former greatly. To accommodate this, we build a new theoretical analysis framework, and our sharp, flexible concentration inequalities involve the variance and a mixing of Orlicz $Ψ_α$-tails through the min and max functions. Our theory yields new, improved concentration inequalities even for the cases of sub-Gaussian and sub-exponential distributions with $α= 2$ and $1$, respectively. We further demonstrate our theory on martingales, random vectors, random matrices, and covariance matrix estimation. These sharp concentration inequalities can empower more precise non-asymptotic analyses across different statistical and machine learning applications.

artificial intelligence, machine learning, modeling & simulation, (19 more...)

arXiv.org Machine Learning

2603.25934

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Industry: Education > Educational Setting > Higher Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.66)
Information Technology > Modeling & Simulation (0.54)

Add feedback

6ffa1f5ad26addef897dcb938e525db7-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-14-2026, 01:46:07 GMT

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.49)

Add feedback

6ffa1f5ad26addef897dcb938e525db7-Paper-Conference.pdf

Neural Information Processing SystemsFeb-14-2026, 01:46:03 GMT

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.51)

Add feedback

63c3ddcc7b23daa1e42dc41f9a44a873-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 16:16:18 GMT

debiased, objective, probability, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

M-Statistic for Kernel Change-Point Detection

Shuang Li, Yao Xie, Hanjun Dai, Le Song

Neural Information Processing SystemsOct-9-2025, 13:59:20 GMT

Detecting the emergence of an abrupt change-point is a classic problem in statistics and machine learning. Kernel-based nonparametric statistics have been proposed for this task which make fewer assumptions on the distributions than traditional parametric approach. However, none of the existing kernel statistics has provided a computationally efficient way to characterize the extremal behavior of the statistic. Such characterization is crucial for setting the detection threshold, to control the significance level in the offline case as well as the average run length in the online case. In this paper we propose two related computationally efficient M -statistics for kernel-based change-point detection when the amount of background data is large. A novel theoretical result of the paper is the characterization of the tail probability of these statistics using a new technique based on change-of-measure. Such characterization provides us accurate detection thresholds for both offline and online cases in computationally efficient manner, without the need to resort to the more expensive simulations such as bootstrapping. We show that our methods perform well in both synthetic and real world data.

artificial intelligence, machine learning, threshold, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

6ffa1f5ad26addef897dcb938e525db7-Supplemental-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 21:25:13 GMT

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.49)

Add feedback

Stochastic Multi-armed Bandits: Optimal Trade-off among Optimality, Consistency, and Tail Risk

Neural Information Processing SystemsOct-8-2025, 21:25:08 GMT

Such decaying rate is proved to be best achievable.

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.87)

Add feedback

A Proofs of Theoretical Results

Neural Information Processing SystemsOct-3-2025, 02:12:37 GMT

Lemma 1. F or any embedding f and finite N, we have L Theorem 3. F or any embedding f and finite N and M, we have e L By Jensen's inequality, we may push the absolute value inside the expectation to see that The outer expectation disappears since the tail probably bound of Theorem A.2 holds uniformly for all fixed x, x We still owe the reader a proof of Lemma A.2, which we give now. We then proceed to bound the right hand tail probability. Combining Lemma A.3 and Lemma A.4, with probability at least 1, for all f 2F, we have L Note the definition of g is slightly modified in this context. We again use the Adam optimizer with learning rate 0 . To implement the debiased objective, we only modify the "src/s2v-model.py"

artificial intelligence, machine learning, objective, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

tail probability

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Gaussian Approximation and Concentration of Constant Learning-Rate Stochastic Gradient Descent

Sharp Concentration Inequalities: Phase Transition and Mixing of Orlicz Tails with Variance

6ffa1f5ad26addef897dcb938e525db7-Supplemental-Conference.pdf

6ffa1f5ad26addef897dcb938e525db7-Paper-Conference.pdf

63c3ddcc7b23daa1e42dc41f9a44a873-Supplemental.pdf

M-Statistic for Kernel Change-Point Detection

6ffa1f5ad26addef897dcb938e525db7-Supplemental-Conference.pdf

Stochastic Multi-armed Bandits: Optimal Trade-off among Optimality, Consistency, and Tail Risk

A Proofs of Theoretical Results

5a66b9200f29ac3fa0ae244cc2a51b39-Supplemental.pdf